perm filename BIRD[F82,JMC] blob
sn#688553 filedate 1982-12-06 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00008 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 bird[f82,jmc] Minsky's bird example - for circumscription
C00005 00003 Introduction
C00016 00004 Reasoning:
C00026 00005 A SOMEWHAT GENERALIZED FORM OF CIRCUMSCRIPTION
C00041 00006 References:
C00043 00007 Remarks:
C00050 00008 Problems to formalize
C00055 ENDMK
C⊗;
bird[f82,jmc] Minsky's bird example - for circumscription
A BIRD CAN FLY - UNLESS IT IS PREVENTED FROM FLYING
Abstract:
In his 1982 presidential address to the American Association
for Artificial Intelligence Marvin Minsky again expressed his
opinion that logic, including non-monotonic logic, wasn't suitable
for expressing common sense knowledge and common sense reasoning.
He illustrated this with the example of the common sense facts
about birds being able to fly except for ostriches, dead birds
and whatever other possible exceptions which anyone might chance to
think of.
The original purpose of this article was just
to take up Minsky's challenge
on behalf of one method of non-monotonic reasoning - the
circumscription of (McCarthy 1980). However, in order to do
the example properly, and I hope convincingly, it was necessary
to make certain improvements and generalizations in circumscription,
and therefore the content of this paper is technical as well
as expository and polemical.
Mainly, it is necessary to include in the knowledge
base some information about the order in which circumscriptions are
normally to be done. Also we describe circumscription of formulas
and not just of predicates.
Introduction
If the only means of reasoning proposed is logical deduction,
it seems impossible to express the known common sense reasoning
about the ability of birds to fly. The problem is that the simple axiom
∀x.is-bird x ⊃ can-fly x
admits exceptions and requires qualification. If we qualify it by writing
∀x. is-bird x ∧ ¬is-ostrich x ⊃ can-fly x,
we are immediately invited to add many more qualifications including some
that no-one has ever heard of before. Since no-one has heard of them
before, it is implausible that they can be part of every human's
common sense knowledge, and it seems unhopeful to make the success of
artificial intelligence depend on finding them all and putting them
in a data base.
This "qualification problem", discussed in
(McCarthy 1977) is one of the motivations for introducing
formalized non-monotonic reasoning.
Some researchers, including Minsky, take this as
evidence that representation of facts in logical languages and logical
reasoning are not suitable for common sense reasoning in AI and have
proposed other formalisms. However, as Nilsson (1981) has argued,
the other formalisms suffer from similar difficulties, usually lead to
representations that are very specialized, and often turn out to be
equivalent to subsets of first order logic.
The advantages of using logical languages to express facts
include the following:
1. Facts can be expressed independently of purpose so that the
same facts can be used in deciding how to achieve different goals
including goals that were not anticipated when the facts were discovered.
2. Collections of facts have logical consequences, and part of
common sense consists of having available sufficiently obvious consequences
of the facts one is told and one's general common sense knowledge.
3. Mathematical logicians and computer scientists have
accumulated much knowledge aobut logical formalisms (mainly first order logic),
and this knowledge provides a good basis for the development of
reasoning programs. These and other virtues are
recounted in (McCarthy 1960), (McCarthy and Hayes 1969), (Hayes 197x),
Nilsson (1981) and (Newell 1980). Since approximately 1977 various
authors have proposed to get around the qualification problem (noticed
in various guises)
by supplementing logical deduction by various
kinds of formalized non-monotonic reasoning. One such proposal, called
circumscription, is described in (McCarthy 1980), and the object of this
note is to express the common sense knowledge about the ability of birds
to fly as a first order axiom and to show how circumscription may be used
to draw the conclusions expected from common sense and not others. Thus
we propose to meet one of the challenges expressed in Minsky's address.
The basic idea is to use an axiom
1. ∀x.is-bird x ∧ ¬prev-can-fly x ⊃ can-fly x
where prev-can-fly is a new predicate such that prev-can-fly x asserts that
x is prevented from being able to fly.
If we only allowed deductive reasoning,
prev-can-fly wouldn't help, because its use would amount merely to asserting
that a bird can fly unless it can't. However, we use circumscription to
infer that a bird is not prevented from flying unless there is information
permitting the inference that it is prevented. This amounts to asserting
that prev-can-fly has the minimal extension compatible with the facts about
it that we take into account. While prev-can-fly x can be read "x is
prevented from flying", its role is mainly technical in the theory, and no
claims are made about what it "really means". The important issue is
whether the system that uses it will draw reasonable conclusions in a
sufficiently wide variety of circumstances.
Since 1 can also be written
1'. ∀x. is-bird x ⊃ prev-can-fly x ∨ can-fly x,
it is clear that we shall need some mechanism for treating
prev-can-fly and can-fly differently.
We work with the following additional axioms regarded as representing the
relevant part of a hypothetical data base of common sense knowledge.
The reader should consider to what extent he accepts these assertions as expressing
general common sense knowledge and not ad hoc to a particular problem.
2. ∀x.is-penguin x ⊃ is-bird x
∀x.is-ostrich x ⊃ is-bird x
∀x.is-canary x ⊃ is-bird x
The facts that penguins and ostriches are birds don't help in reaching
the conclusion that Tweety (a canary) can fly, but they are part of
the common sense data base, and it is important to show that they do no
harm either.
3. ∀x.is-ostrich x ⊃ prev-can-fly x
∀x.is-penguin x ⊃ prev-can-fly x
∀x.dead x ⊃ prev-can-fly x
The qualification problem arises because there can be
any number of facts like those listed under 3. When a new one is remembered
or discovered or conjectured, it can be added to the above list.
Through the magic of circumscription, such a new fact will interfere with
our concluding that Tweety can fly only if there is positive evidence that
the new circumstance applies to Tweety.
4. ∀x.prev-can-fly x ⊃ ¬can-fly x
Presumably this is also part of common sense knowledge.
5. is-ostrich Joe
is-canary Tweety
is-rock Henry
These are special facts relevant to a particular situation, not part of
general common sense knowledge.
REASONING BY CIRCUMSCRIPTION
We begin by presenting the circumscriptive reasoning so as
to reach our goals.
Goals:
The ground facts we wish to come up with are:
is-bird Tweety ; follows deductively
is-bird Joe ; follows deductively
¬is-bird Henry
This requires a circumscription minimizing is-bird, and it also requires the
fact that Henry ≠ Tweety. This involves Reiter's (1979) "unique names
hypothesis", and we shall show how we propose to obtain it by
circumscription.
prev-can-fly Joe ; Not actually used except in connection with 4
; to show positively that Joe can't fly.
¬prev-can-fly Tweety; This requires circumscribing prev-can-fly
and addtional
; circumscriptions to show that Tweety isn't Joe
; or any other ostrich.
can-fly Tweety ; Our main goal
¬can-fly Joe ; A purely deductive conclusion. Since we can imagine
; that ostriches might fly in sufficiently low gravity
; and sufficiently high air density, there really should
; be another potentially disabling predicate in
; axiom 3, so that the part about ostriches would
; become
3'. ∀x.is-ostrich x ∧ ¬prev-prev-can-fly x ⊃ prev-can-fly x
If we are taking into account no facts that might enable one to deduce
that an ostrich can fly after all, circumscribing prev-prev-can-fly will
make it always false, and axiom 3' will reduce to axiom 3.
In what follows we will use 3 rather than 3'.
Reasoning:
There are several circumscriptions we could do. The simplest
would be to circumscribe prev-canfly in the conjunction of the axioms 3.
The circumscription schema is then
[∀x.ostrich x ⊃ P x] ∧ [∀x.penguin x ⊃ P x] ∧ [∀x.dead x ⊃ P x] ∧ [∀x.Px ⊃ prev-canfly x]
⊃ [∀x.prev-canfly x ⊃ Px]
where P is the predicate variable of the schema. Setting
P x ≡ ostrich x ∨ penguin x ∨ dead x
makes each of the four conjuncts of the left side of the schema satisfied,
and so we get
∀x.prev-canfly x ⊃ ostrich x ∨ penguin x ∨ dead x.
Combining this with 3 gives
∀x. prev-canfly x ≡ ostrich x ∨ penguin x ∨ dead x,
asserting that exactly ostriches, penguins and the dead are prevented from
flying. This is the result we want, considering that we are taking into
account only those obstacles to flying,
but what justifies taking only 3
into account and leaving 1 and 4 out of the circumscription?
Actually including 4 in the circumscription presents no problem. We
get an extra premiss
∀x.P x ⊃ ¬ canfly x
in the circumscription schema, and when we substitute for P x the disjunction
mentioned above, this becomes
∀x.ostrich x ∨ penguin x ∨ dead x ⊃ ¬canfly x
which is provable from 3 and 4.
However, including 1 in the circumscription would leave us with
∀x.prev-canfly x ≡ ostrich x ∨ penguin x ∨ dead x ∨ (isbird x ∧ ¬canfly x),
and the last term makes this unusable.
There is another way to get the desired answer and still include
1 in the circumscription formula. Namely, we include 1 but regard
canfly as a variable that can be chosen so as to minimize prev-canfly.
The circumscription schema then becomes.
[∀x.isbird x ∧ ¬P(x) ⊃ Q x] ∧ [∀x.ostrich x ⊃ P x] ∧ [∀x.penguin x ⊃ P x]
∧ [∀x.dead x ⊃ P x] ∧ [∀x.P x ⊃ prev-canfly x]
⊃ ∀x.prev-canfly x ⊃ P x.
Now we can choose P as before and choose Q so as to make the first
premiss true.
We regard this alternative as better than only including 3 in
the circumscription, because we then take all available facts into account.
Moreover, if we had an additional fact ¬canfly(Tom), then the premiss of
the circumscription would have the additional fact ¬Q(Tom), and
we would have to write
P x ≡ x = Tom ∨ ostrich x ∨ penguin x ∨ dead x
in order to satisfy the left side of the schema, and this takes into
account the fact that Tom can't fly without interfering with Tweety.
Circumscription is a rule of conjecture, so logically we could
stop at this point with the conclusion that it allows us to make the
conjecture we want in the case of Tweety. However, since we want computer
programs to come up with the right conjectures, we need some rules
about what should be circumscribed and what predicates should be allowed
to flap in the circumscription. We have our choice as to whether these
rules are imbedded in the reasoning program or are represented by some
kind of meta-sentences. We can even do both. The meta-sentences will
be wanted if we want to do meta-reasoning logically.
We don't have a general proposal at this time.
Here, however is a scheme that handles the present case and some more. We
replace prefly x by the more general prevented(canfly, x) and write
1''. ∀x.isbird x ∧ ¬prevented(canfly,x) ⊃ canfly x.
Here prevented is taken as a second order predicate symbol that
takes a predicate and its arguments as arguments. 3 then becomes
3''. ∀x.ostrich x ⊃ prevented(canfly,x), etc.
Now we can have a general rule that prevented is to be circumscribed
on its individual variable with its predicate variable flapping.
We could also let ostrich, penguin, and dead flap, and take
the facts 2 and 4 involving them into the circumscription. The
result would be that we could take
∀x.P x ≡ x = Joe
in the circumscription schema, since the predicate variables replacing
penguin and dead can be taken as always false.
The conclusion then becomes
∀x.prev-canfly x ≡ x = Joe,
and substituting Tweety into 1 and using 2 and 4 gives
¬(Tweety = Joe) ⊃ canfly Tweety.
The unique names hypothesis
The unique names hypothesis (Reiter 197x) is another form of
non-monotonic reasoning that says that objects with different names are
presumed to be different unless they are provable to be identical.
We prefer to reach this conclusion by circumscription in the following
way. We index the objects by integers writing
6. object(1) = Joe
object(2) = Tweety
object(3) = Henry.
We then write
∀m n.m≠n ∧ ¬prevented(different,m,n) ⊃ object(m) ≠ object(n).
As described above, the predicate prevented is to be circumscribed
with respect to any facts we may have causing numbers to designate
objects that are the same. We assume also that we have enough
arithmetic to be able to prove that the numbers are different -
or (by an attachment) we trust our computer to tell us that they are.
In the present case we will have no facts that allow us to conclude
that any pair of Joe, Tweety and Henry are equal, and so we'll conclude
that they are different.
Notice that writing
6*. number(Joe) = 1
number(Tweety) = 2
number(Henry) = 3
would force Joe, Tweety and Henry to be different rather than making
it a non-monotonic conclusion.
If we imagine the above reasoning to be carried out in complete
isolation from all other facts, then the conclusion that Joe and
Tweety are the only birds (not deduced above but deducible by the
same methods) would be acceptable, but a more rugged reasoning
program should be able to admit the existence of other birds and
flying objects. This suggests the use of some such predicate as
relevant(x,P1) meaning that x is relevant to problem P1 and
relativizing the formulas with respect to this predicate. We haven't
yet figured out how this should be done.
John McCarthy
Computer Science Department
Stanford University
Stanford, CA 94305
Arpanet: JMC@SU-AI
A SOMEWHAT GENERALIZED FORM OF CIRCUMSCRIPTION
It has become apparent that circumscription (McCarthy 1980)
should be regarded as a form of minimization
of logical expressions analogous to more conventional mathematical
minimizations. It is worthwhile emphasizing
the analogies in order to better understand logical minimization.
We begin with minimization in partial ordered sets.
Let f:A → B, and let B be partial ordered by a relation ≤.
If we want to find x ε A absolutely minimizing f(x) subject to
a constraint p(x), x is characterized by
(1) p(x) ∧ ∀y.p(y) ⊃ f(x) ≤ f(y).
If, however, we only require that x be such that no y satisfying
p(y) is larger, we write
(2) p(x) ∧ ∀y.(p(y) ∧ f(y) ≤ f(x) ⊃ f(x) ≤ f(y).
This is entirely conventional and has nothing special to do with
logical minimization. Of course, the "local minima" characterized by
(2) need not exist, and when some do, the absolute minimum characterized
by (1) need not exist. Proving that minima exist is often
the main problem in mathematical minimization, and we may expect it
to be important in logical minimization also. A common sufficient
condition is topological compactness, and perhaps this idea is also
applicable to logical minimization.
In logical minimization, the ordering is obtained from the
trivial ordering
false < true
by quantification over one or more domains. Let E[x] and F[x]
be wffs with x as a free variable. We say
E ≤ F
or perhaps more strictly it should be
λx.E[x] ≤ λx.F[x]
provided
(3) ∀x.E[x] ⊃ F[x].
More generally, we may have tuplets <E1, ... , En> and <F1, ... ,Fn>
with variables in different domains. There are then two useful
combined orderings - parallel and lexicographic. In the parallel case
we write
<E1, ... , En> ≤ <F1, ... , Fn>
provided
[∀x1.E1(x) ⊃ F1(x)] ∧ ... ∧ [∀xn.En(x) ⊃ Fn(x)].
In the lexicographic case, we write
<E1, ... , En> ≤ <F1, ... , Fn>
provided
[∀x1.E1(x) ⊃ F1(x)] ∨ [∀x.E1(x) ≡ F1(x)] ∧ <E2...En> ≤ <F2...Fn>.
Now suppose that the formula E[x] involves some
predicate and individual variables, P1, ... Pk, Q1, ... Ql, y1, ... ym,
and z1, ... zn. We summarize these as P, Q, y, and z and therefore
write E[x,P,Q,y,z]. We suppose that the predicate variables P and
the individual variables y may be chosen so as to achieve the desired
minimum, while the predicate variables Q and the individual
variables z are regarded as parameters so that the minimum obtained
will depend on the Q and the z.
We also suppose that the P, Q, y, and z are constrained to satisfy
some formula
(4) A[P,Q,y,z].
Equation (2) then becomes
(5) A[P,Q,y,z] ∧ ∀P' y'.A(P',Q,y',z) ∧ (∀x.E[x,P',Q,y',z] ⊃ E[x,P,Q,y,z])
⊃ (∀x.E[x,P,Q,y,z] ⊃ E[x,P',Q,y',z]).
When we circumscribe a predicate P, as described in (McCarthy 1980),
then E[x] is just P(x),
and the other predicates enter through the constraining formula (4).
CIRCUMSCRIPTIVE THEORIES
Let us suppress the variables Q and y, i.e. we suppose they
In this section we will keep the predicate Q and the quantity
z constant and drop the variable y leaving the predicate P axiomatized
by the sentence A[P]. (5) then takes the form A'[P] defined by.
(6) A'[P] ≡ A[P] ∧ ∀P'.A[P'] ∧ (∀x.E[P',x] ⊃ E[P,x]) ⊃ (∀x.E[P,x] ⊃ E[P',x]).
We can write
(7) A'[P] ≡ circum[E,A][P]
or
(8) A' = circum[E,A],
which allows us to consider circumscription of an expression E as
an operator on the formula A. It is illuminating to consider two
operations E1*E2 and E1+E2. The former is defined by
(9) circum[E1*E2,A] = circum[E1,circum[E2,A]],
i.e. the result of circumscribing E1 with the axiom A'[P] obtained from
circumscribing E2. E1+E2 is the parallel circumscription, i.e.
(10) circum[E1+E2,A][P] ≡ A[P] ∧ ∀P'.A[P'] ∧ (∀x.E1[P',x] ⊃ E1[P,x])
∧ (∀x.E2[P',x] ⊃ E2[P,x])
⊃ (∀x.E1[P,x] ⊃ E1[P',x]) ∧ (∀x.E2[P,x] ⊃ E2[P',x]).
Now suppose we have some finite collection EE of expressions. They
can
References:
Clark, K. L. (1978). Negation as failure,
in Logic and Data Bases, H. Gallaire and J. Minker (eds.) Plenum Press, NY
293-322.
Kowalski, R. (1978) Logic for data description,
in Logic and Data Bases, H. Gallaire and J. Minker (eds.) Plenum Press, NY
77-103.
Reiter, R. (1978) On closed world data bases,
in Logic and Data Bases, H. Gallaire and J. Minker (eds.) Plenum Press, NY
55-76.
Reiter, R. (1982) Circumscription implies predicate completion - sometimes,
in Proceedings of the National Conference on Artificial Intelligence, AAAI-82,
418-420.
Siegel, Pierre and Genevieve Bossu (1981). La saturation au secours
de la non-monotonie. (Saturation to the rescue of non-monotonicity).
(These presentee pour l'obtention du grade de docteur de 3eme cycle
en intelligence artificielle) - obtainable from the authors at
Groupe d'Intelligence Artificelle, Case 901, 70 Route Leon Lachamp
- 13288 Marseille Cedex 2, France.
Remarks:
1. Systematizing prev.
One is attempted to generalize the bird example in the following
way. We wrote prev-can-fly, and mentioned the possibility of
prev-prev-can-fly, but putting prev before can-fly had only
mnemonic significance; it wasn't part of the theory.
We can imagine, however, that for every predicate P, we have
another predicate prev(P). Of course, this would generate an
infinite series of predicates, but since the predicates are all to
be circumscribed, if we have nothing
special to say about a predicate in the series, the series
effectively terminates. The bird axiom might then be written
∀x.is-bird x ⊃ ifunp(can-fly) x,
where the second order function ifunp (standing for "if unprevented") satisfies
∀p x. ifunp(p) x ≡ prev(p)(x) ∨ p(x),
and we have the rule that prev(p) is always circumscribed with
higher probability than p itself. In the bird case, this comes to
∀x.is-bird x ⊃ prev(can-fly) x ∨ can-fly x,
which is essentially the axiom we have been using.
The trivial case is that
within the facts taken into account there is no other mention of
prev(p). prev(p)(x) then circumscribes to false. In the bird
example, if there were no mention of prev(can-fly), the axioms
would be equivalent to the straightforward
∀x.is-bird x ⊃ can-fly x.
Admitting prev(canfly) via
∀x.is-ostrich x ⊃ ifunp(prev(can-fly)) x
gives in general
∀x.is-ostrich x ⊃ prev(prev(can-fly)) x ∨ prev(can-fly) x,
but since there is no other mention of prev(prev(can-fly)), this
reduces to
∀x.is-ostrich x ⊃ prev(can-fly) x.
My only reason for not having adopted this change and
writing the paper using it, is that this one example isn't enough
evidence that making prev a function will be of general utility
in AI.
At present writing, there seems to be some hope that
a substantial amount of common sense knowledge could be expressed
using prev and ordinary predicates, and having as our only
priority rule that prev(P) has higher circumscription priority
than P. If this works, it may tie down the epistemology of
common sense data bases intended to be used with circumscription
enough so that we can begin to write heuristic programs.
2. To cast doubt on a predicate P may be to replace all postive
occurrences of P(e) in the database by P(e) ∨ prev(P)(e). However,
this may be too strong; we may want to be able to cast doubt on
a subset of the occurrences of P.
3. The Reiter tale of two cities casts doubt on the
optimality of our formalism. We have the presumption that
a man lives in the same city as his wife and also the presumption
that he lives in the same city as his employer. In Reiter's
example,
city(wife(Pat)) = Vancouver ∧ city(employer(Pat)) = Toronto.
We can do this example by making city(x,y) a predicate
meaning that x's city is y. We then can write
∀x y.city(wife x, y) ⊃ prev(city)(x,y) ∨ city(x,y)
and similarly
∀x y.city(employer x, y) ⊃ prev(city(x,y)) ∨ city(x,y).
Circumscribing both city and prev(city) with the latter having
higher priority will then have the desired effect. If only
city(wife Pat, Vancouver)
or
city(employer Pat, Toronto)
is taken into account, the circumscription of prev(city)(x)
will assign to Pat the same city as his wife or employer. If
both are taken into account, a disjunction will be produced
and we will conclude eventually
city(Pat, Vancouver) ∨ city(Pat, Toronto).
However, it would be better to be able to use the function city
should we prefer it to the predicate. Maybe if we want to use
functions we need to use partial functions, i.e. to admit bottom
as a value. This certainly would fit with minimization.
3. Is there a notion of a circumscriptive theory, i.e. a set of axioms and
prescriptions for circumscription? Such a theory should have a circumscriptive
completion - an ordinary theory. The notion is non-monotonic in that
the completion of a theory can contain sentences not in the completion of
an extended theory.
Problems to formalize
1. teachers are female
policemen are male
Jane is a feminine name
people with feminine names are female
Jane is a policeman
Evelyn is a policeman
In England Evelyn is often a male name.
2. Tseytin's eample of the peas.
holds(good-crop) ∧ (∀x.¬(holds x ∧ dominates(x,good-crop)) ⊃ holds(goodness)
holds(good-crop) ∧ ¬prev(rules) good-crop ⊃ rules good-crop
holds(burned-out(crop)) ⊃ prev(rules) good-crop.
A Russian folk tale concerns a peasant who sells a chicken
in the market for a good price and gets a good buy on a sack of peas.
"Good", comments the listener. But on the way home the peas leak out
through a hole in the sack. "Bad", comments the listener. But the
peas grow into a good crop by the side of the path. "Good". However,
the crop of peas is eaten by birds. "Bad". The peasant nets the
birds and sells them. "Good". And so on. xx Tseytin told me this tale
as an example of non-monotonic reasoning, and indeed the essential
non-monotonic aspect of it is readily formalized in any of the following
ways.
We shall use a situation variable, although since we don't
bother taking into account the fact that the e
One way of looking at the pea story emphasizes the succession
of events in time. Each reversal of the peasant's fortunes is caused
by an event subsequent to the situation that has been judged. However,
it is a property special to this story that each event is of magnitude
to dominate the preceding events. Therefore, we offer a formalization
that omits time and concentrates on the notion that each of a sequence
of facts prevents the preceding fact from having its normal effect.
Some reification or higher order logic seems to be required to make the
formalization say what we want.
good-crop ∧ ¬prev-rules-good-crop ⊃ good
Here we have something that doesn't quite fit into the earlier
usage. Namely, instead of prev-good-crop, we have
prev-rules-good-crop, i.e. we need to say that the good crop is
prevented from having its usual consequence of good. We can't
use prev-good, because, as is shown by the later twists of the
peasant's fate, good may still be the outcome even if the good
crop is prevented from having its usual effect.
Of course, what negates the effect of the good crop can
either be something related to it like birds eating it or a drop
in farm price supports or something unrelated like the peasant's
wife dying. We need a new formulation. Perhaps we need to actually
distinguish these two kinds of negation - the one an actual prevention
and the other giving goodness a kind of additive character in terms
of the goodness of each feature of the situation.